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Second order asymptotics in 
fixed-length source coding and intrinsic randomness 

Masahito Hayashi 


Abstract —Second order asymptotics of fixed-length source 
coding and intrinsic randomness is discussed with a constant 
error constraint. There was a difference between optimal rates of 
fixed-length source coding and intrinsic randomness, which never 
occurred in the first order asymptotics. In addition, the relation 
between uniform distribution and compressed data is discussed 
based on this fact. These results are valid for general information 
sources as well as independent and identical distributions. A 
universal code attaining the second order optimal rate is also 
constructed. 

Index Terms —Second order asymptotics, Fixed-length source 
coding, Intrinsic randomness, Information spectrum, Folklore for 
source coding 

I. Introduction 

ANY researchers believe that any sufficiently com¬ 
pressed data approaches a uniform random number. 
This conjecture is called Folklore for source coding (Han [1]). 
The main reason for this conjecture seems to be the fact that 
the optimal limits of both rates coincide with the entropy 
rate: that is, the optimal compression length equals the op¬ 
timal length of intrinsic randomness (uniform random number 
generation) in the asymptotic first order. There is, however, no 
research comparing them in the asymptotic second order even 
though some researchers treat the second order asymptotics 
for variable-length source coding [2], [3]. In this paper, taking 
account of the asymptotic second order, we compare them in 
the case of the general information source in the fixed-length 
setting. Especially, we show by application to the case of the 
independent and identical distribution (i.i.d.), that the size of 
compression is greater than the one of intrinsic randomness 
with respect to the asymptotic second order. This fact implies 
that data generated by the fixed-length source coding is not a 
uniform random number. 

Details of the above discussion are as follows. The size 
of generated data is one of the main points in data com¬ 
pression and intrinsic randomness. In the asymptotic setting, 
by approximating the size M n as M n = e na , we usually 
focus on the exponential component (exponent) a. Smaller 
size is better in data compression, but larger size is better 
in intrinsic randomness. Both optimal exponents a coincide. 
However, as will be shown in this paper, the size M n can 
be approximated as M n = e na +Vn b _ j n this paper, we call 
the issue concerning the coefficient a of n the first order 
asymptotics, and the issue concerning the coefficient b of \/ri, 
the second order asymptotics. When the information source is 

M. Hayashi is with Quantum Computation and Information Project, ER¬ 
ATO, JST, 5-28-3, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan, (e-mail: 
masahito@qci.jst.go.jp) 


the independent and identical distribution P n of a probability 
distribution P, the optimal first coefficient is the entropy H (P) 
in both settings. In this paper, we treat the optimization of 
the second coefficient b for general information sources. In 
particular, we treat intrinsic randomness by using half of the 
variational distance. These two coefficients do not coincide 
with each other in many cases. In particular, these optimal 
second order coefficients depend on the allowable error even 
in the i.i.d. case. (Conversely, it is known that these optimal 
first order coefficients are independent of the allowable error 
in the i.i.d. case when the allowable error is constant.) If the 
allowable error is less than 1/2, the optimal second order 
coefficient for source coding is strictly larger than the one for 
intrinsic randomness. As a consequence, when the constraint 
error for source coding is sufficiently small, the compressed 
random number is different from the uniform random number. 
Hence, there exists a trade-off relation between the error of 
compression and the error of intrinsic randomness. 

However, Han [1], [4], [5] showed that the compressed 
data achieving the optimal rate is ‘almost’ uniform random 
at least in the i.i.d. case in the fixed-length compression. 
Visweswariah et al.[ 6] and Han & Uchida [7] also treated 
a similar problem in the variable-length setting. One may 
think that Han’s result contradicts our result, but there is no 
contradiction. This is because Han’s error criterion between the 
obtained distribution and the true uniform distribution is based 
on normalized KL-divergence {30), and is not as restrictive as 
our criterion. Thus, the distribution of the compressed data 
may not be different from the uniform distribution under our 
criterion even if it is ‘almost’ the uniform distribution under his 
criterion. Indeed, Han [5] has already stated in his conclusion 
that if we adopt the variational distance, the compressed data 
is different from the uniform random number in the case of 
the stationary ergodic source. However, in this paper, using 
the results of second order asymptotics, we succeeded in 
deriving the tight trade-off relation between the variational 
distance from the uniform distribution and decoding error 
probability of the fixed-length compression in the asymptotic 
setting. Further, when we adopt KL-divergence divided by yfn 
instead of normalized KL-divergence, the compressed data is 
different from the uniform random number. Hence, the speed 
of convergence of normalized KL-divergence to 0 is essential. 

In this paper, we use the information spectrum method 
mainly formulated by Han[4], We treat the general information 
source, which is the general sequence { p n } of probability 
distributions without structure. This method enables us to 
characterize the asymptotic performance only with the random 
variable log p n (the logarithm of likelihood) without any 



2 


further assumption. In order to treat the i.i.d. case based 
on the above general result, it is sufficient to calculate the 
asymptotic behavior of the random variable - log/).,,. More¬ 
over, the information spectrum method leads us to treat the 
second order asymptotics in a manner parallel to the first order 
asymptotics, whose large part is known. That is, if we can 
suitably formulate theorems in the second order asymptotics 
and establish an appropriate relation between the first order 
asymptotics and the second order asymptotics, we can easily 
extend proofs concerning the first order asymptotics to those of 
the second order asymptotics. This is because the technique 
used in the information spectrum method is quite universal. 
Thus, the discussion of the first order asymptotics plays an 
important role in our proof of some important theorems in 
the second order asymptotics. Therefore, we give proofs of 
some theorems in the first order asymptotics even though 
they are known. This treatment produces short proofs of main 
theorems for the second order asymptotics with reference to 
the corresponding proofs on the first order asymptotics. 

While we referred the i.i.d. case in the above discussion, 
the Markovian case also has a similar asymptotic structure. 
That is, the limiting distribution of the logarithm of likeli¬ 
hood is equal to normal distribution. Hence, we have the 
same conclusion concerning Folklore for source coding in 
the Markovian case. Moreover, we construct a fixed-length 
source code that attains the optimal rate up to the second order 
asymptotics, i.e ., a universal fixed-length source code. We also 
prove the existence of a similar universal operation for intrinsic 
randomness. Further, in Section VI-A, we derived the optimal 
generation rate of intrinsic randomness under the constant 
constraint concerning the normalized KL-divergence, which 
was mentioned as an open problem in Han’s textbook[4]. 

Finally, we should remark that the second order asymptotics 
correspond to the central limit theorem in the i.i.d. case 
while the first order asymptotics corresponds to the law of 
large numbers. But, in statistics, the first order asymptotics 
corresponds to the central limit theorem. Concerning variable- 
length source coding, the second order asymptotics corre¬ 
sponds to the central limit theorem, but its order is log n. As 
seen in sections VIII and llX-Kl the application of this theorem 
to variable- and fixed-length source coding is different. 

This paper is organized as follows. We explain some no¬ 
tations for the information spectrum method in the first and 
the second order asymptotics in section II. We treat the first 
order asymptotics of fixed-length source coding and intrinsic 
randomness based on variational distance in section III, some 
of which are known. For the comparisons with several pre¬ 
ceding results, we treat several versions of the optimal rate 
in this section. The second order asymptotics in both settings 
are discussed as the main result in section IV. We discuss the 
relation between the second order asymptotics and Folklore 
for source coding based on variational distance in section V. 
In addition, we discuss intrinsic randomness based on KL- 
divergence, and the relation between Han[4]’s criterion and the 
second order asymptotics in section VI. For comparison with 
Han[4]’s result, we treat intrinsic randomness under another 
KL-divergence criterion, in which the input distributions of 
KL-divergence are exchanged. In section VII, the Markovian 


case is discussed. A universal fixed-length source code and 
a universal operation for intrinsic randomness are treated in 
section VIII. All proofs are given in section IX. 


II. Notations of information spectrum 


In this paper, we treat general information source. Through 
this treatment, we can understand the essential properties of 
problems discussed in this paper. First, we focus on a sequence 
of probability spaces and a sequence of probability 

distributions p = {p n }^Li on them. The asymptotic behavior 
of the the logarithm of likelihood can be characterized by the 
following known quantities 

HSAp) = f inf{a| limp„{—— \ogp n (oj) < a} > e} 
a n 

= sup{a| limp„{-log p n {u>) < a} < e}, 

a n 

H(e\p) d = inf{a| limp„{—-logp„(w) < a} > e} 
a n 

= sup{a| limp n {-log p n {pj) < a} < e}, 

a n 

for 0 < e < 1, and 

K+(t\p) = f inf{a|limp rl {--logp„(w) < a} > e} 
a n 

= sup{a| \imp n { -log p n {u) < a} < e}, 

a n 

H + (e\p) d = inf{a| limp„{-- logp n (ui) < a} > e} 
a n 

= sup{a| limpnl-log p n {u) < a} < e}, 

a n 


for 0 < e < 1, where the u> is an element of the probability 
space Q, n . 

For example, when the probability p n is the n-th inde¬ 
pendent and identical distribution (i.i.d.) P" of P , the law 

of large numbers guarantees that these quantities coincide 
__ 

with entropy H(P) = — P(lo) logP(w). Therefore, for a 
more detailed description of asymptotic behavior, we introduce 
the following quantities. 


H_(e,a\p) d = inf{6| limp n {-- \ogp n (uj n ) < a + —=} > e} 
b n \Jn 

= sup{6| limp n { —— \ogp n (u) n ) < a + —=} < e}, 

b n 

H(e,a\p) d = inf{6|limp„{--logp rl (w„) < a+ —} > e} 
b n \Jn 

= sup{6| limp n { —— logp n (w„) < a+ —=} < e}, 

b n a/ti 


for 0 < e < 1. Similarly, H ( (e, alp) and fT + (e, a\p) 
are defined for 0 < e < 1. When the distribution p n 
is the i.i.d. P n of P, the central limit theorem guar¬ 
antees that y / n(— i logP ra (w n ) — P(P)) obeys the nor¬ 
mal distribution with expectation 0 and variance Vp = 
P(w)(—logP(w) — P(P)) 2 . Therefore, by using the 
distribution function $ for the standard normal distribution 
(with expectation 0 and the variance 1): 

q>( x ) d 4 f f -Le-* 2 / 2 dx, 

7-00 
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we can express the above quantities as follows: 

H(e,H(P)\P) =H(e,H(P)\P) 

=K + (e, H(P)\P) = H + (e, H(P)\P ) = (1) 

where P = {P n }- 

In the following, we discuss the relation between the above 
mentioned quantities, fixed-length source coding, and intrinsic 
randomness. 



III. First order asymptotics 
A. Fixed-length source coding 

In fixed-length source coding, first we fix a set of integers 

def 

A4 n = {1,..., M n }. The transformation from the output uj £ 
fl n to an element of the set A4 n is described by a map <f> n : 
fl n —> Mn- which is called encoding. 



the asymptotic bound of compression rate under the constant 
constraint on the error probability, we focus on the following 
values: 

R(e\p) = f inf {lim - log |$„| | lim£(T>„) < e}, 

{$„} n 

R\e\p) d = inf {lim - log |$„| | lim£(T> n ) < e}, 

{$„} n 

R t (e\p) d = inf {lim - log |$„| | lim£($ n ) < e}, 

{$„} n 

for 0 < e < 1, and 

R+(e\p) d = inf {lim - log |$„| | lim£(<F n ) < e}, 

{$»} n 

R+{e\p) = f inf {lim-log|$„||lim£(T> n ) < e}, 

{$„} n 

R+(e\p) = f inf {lim - log |$„ 11 lim£($„,) < e}, 

{$„} n 


for 0 < e < 1. Further, as intermediate quantities, we define 

R(e\p) =‘ inf {limilog|$„||£(T>„) < e, V?t} 

{$„} n 

= inf {lim — log |$„||37V£($„) < e, Vn > N}, 
{■!>„} n 


R 1 (e\p) = f inf { lim — log |<J> 


{4>n} 


e($n) < e 

for infinitely many u 


#*( e l p) = inf {lim — log |th n ||£(d>„) < e} 


{4>n} 


= inf {lim — log |<I , „||31V£($„) < e, Vn > N}, 

{$„} n 


for 0 < e < 1. Here, in order to see the relation with existing 
results, we defined many versions of the optimal coding length. 
The following relations follow from their definitions: 


R(e\p) < R{e\p) < R+(e\p), 

(2) 

R){e\p) < R ] {e\p) < Rl(e\p), 

(3) 

RH^\p) < RH^\p) < #+(e|p), 

(4) 


for 0 < e < 1. 

Concerning these quantities, the following theorem holds. 
Theorem 1: Han[4, Theorem 1.6.1], Steinberg & Verdu[8], 
Chen & Alajaji [9], Nagaoka & Hayashi [10] The relations 

R(l-e\p) = H(e\p), (5) 

R\l ~ e\p) = R x {l - e\p) = H(e\p) (6) 


Fig. 2. Encoding operation in source coding 

The operation recovering the original output to from the 
element of A4 n is described by a map t/;„ : A4 n —> fl n , and 
is called decoding. We call the triple <fr n (A4 n , (j>n,ipn) a code, 
and evaluate its performance by its size \Ai n \ = M n 

and error probability: 

= Pn{tU £ fl n If/^n ° 0n (^) ^ Cj}. 

When we do not need to express the distribution of information 
source p n , we simplify £ Pn ($ ra ) to £(<!>„). In order to discuss 


hold for 0 < e < 1, and the relations 

R+i 1 ~ e|P) = R+iejP), (7) 

R+i 1 ~ e l P) = R+i 1 - e| p) = K+(£\p) (8) 

hold for 0 < e < 1. 

By using the relations 0, 0, and 0, R(e\p), R 1 (e\p), and 
R 1 ( e \p) are characterized as follows. 

Corollary 1: 

H(e\p)<R(l-e\p)<H+(e\p), (9) 

H(e\p) < R f {l-e\p) < H + (e\p), (10) 

K{e\p) < R}{1 - e\p) < H + (e\p). (11) 
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Remark 1: Historically, Steinberg & Verdu[8] derived ®, 
and Chen & Alajaji [9] did d 1 Oi l. Han [4] proved the equation 
R( 1 — e\p) = H(e\p). Following these results, Nagaoka & 
Hayashi [10] proved i?^(l — e\p) = H + (e|p). Other relations 
are proved for the first time in this paper. 

The bounds f?+(l|p) and f?+(l|p) are shortest among 
the above bounds because R(e\p), R^(e\p), f?l(e|p), R(e\p), 
7(1 (e|p), and fi?(e|p) are not defined for epsilon = 1. Hence, 
the bounds f?+(l|p) and i?+(l|p) are used in the discussion 
concerning strong converse property. 

B. Intrinsic randomness 

Next, we consider the problem of constructing approxi¬ 
mately the uniform probability distribution from a biased prob¬ 
ability distribution p n on O,,. We call this problem intrinsic 
randomness, and discuss it based on (half) the variational 
distance in this section. Our operation is described by the pair 
of size M n of the target uniform probability distribution and 

def 

the map cj) n from fl n to M n = M n }. 



Fig. 3. Typical operation of intrinsic randomness 

Performance of = (Ai n , (t>n) is characterized by the size 

def 

| (Dm | = M n and a half of the variational distance between the 
target distribution and the constructed distribution: 

= f d(p n O^iPUMn), (12) 

where d(p, q) = f \ YIuj Ip( w ) — <?(w)| anc * Pu,S is the uniform 
distribution on S. When we do not need to express the 
distribution of information source, p n , we simplify e Pn (i’n) 
to £(d/ n ). Under the condition that this distance is less than e 
the optimal size is asymptotically characterized as follows: 

S(e\p) d = sup{lim-log|4'„||lime(T' n ) < e}, 

{$„} n 

S^elp) = f sup {lim — log |\E , n | | lime(\I/ n ) < e}, 

{4'n} n 

S t (e\p) d = sup{lim-log|4'„||lime(T' n ) < e}, 

{ 4 -„} n 


for 0 < e < 1, and 

S’ + (e|p) d = sup {lim — log | V P II || lim£(’!'„) < e}, 

{$„} n 

S+(e|p) d = sup{lim-log|T'„||lim£(T'„) < e}, 

{>f„} n 

>S+(e|p) d = sup {lim - log |\E'„|| lime(T' n ) < e}, 

{>!/„} U 

for 0 < e < 1. As intermediate quantities, 

^(elp) d = sup {lim i log |4'„||e(T'„) < e}, 

{$„} n 


S\e\p) d = sup {lim — log Id^ 

{’I'n} l 71 


< e 

for infinitely many n 


^(e| P) = sup{lim-log|4' ri ||e(T'„) < e} 

{>!'„} n 

are defined for 0 < e < 1. Similarly, we obtain the following 
trivial relations: 


SOIp) < S(e\p) < 5 + (e|p), 

(13) 

( £ |p) < ^(elF) < ^(elp). 

(14) 

SHe\p) < S*(e|p) < S+Wp), 

(15) 


for 0 < e < 1. 

These quantities are characterized by the following theorem. 


Theorem 2: Han[4, Theorem 2.4.2] The relations 
S(e\p) = H(e\p), (e|p) = S t (e\p) = H(e\p) (16) 

hold for 0 < e < 1, and the relations 
S+(e\p) = K + (e\p), Sl(e\p) = 5^(e|p) = H + (e\p) (17) 
hold for 0 < e < 1. 

Similarly, the following corollary holds. 

Corollary 2: The relations 


K{e\p) < S(e\p) < H + (e\p ), 

(18) 

H(e\p) < 5 f (e|p) < H + (e\p), 

(19) 

H{e\p) < S*(e| p) < H+(e\p) 

(20) 


hold for 0 < e < 1. 

Remark 2: Han[4] proved only the first equation of ( fl6t . 
Other equations are proved for the first time in this paper. 

In the following, in order to treat Folklore for source coding, 
we focus on the operation dr,, = (AT,,, 0 n ) defined from the 
code d>„ = {M n , cf> n ,ip n ). For fixed real numbers e and e' 
satisfying 0 < e, e' <1, we consider whether there exist 
codes = (A4m 4>n,ipn) suc h that 

lime(4 , „) < e, lim£(d/ n ) < e. (21) 

If there exists a sequence of codes {<1' „} satisfying the above 
conditions, the inequalities 

H(e'\p) = S(e\p) > lim - logM„ > R(e\p) = 77(1 - e\p), 
n 

H(e'\p) = ^(e'lp) > lim — logM n > R^(e\p) = H(1 — el p) 
n 
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hold. Therefore, we obtain the following necessary condition 
for the existence of {$„} satisfying ( 12 II : 

H{e'\p)>m-t\P), H{e'\y)>H(l-e\p). ( 22 ) 

Thus, the necessary condition <E3 is satisfied in the case of 
i.i.d. P n because these quantities coincide with the entropy 
H(P). 

However, the above discussion is not sufficient, because, as 
is shown based on the second order asymptotics, a stronger 
necessary condition exists. 

IV. Second order asymptotics 

Next, we proceed to the second order asymptotics, which 
is very useful for obtaining the stronger necessary condition 
than (E3. Since these values fF(e'|p), H{e\p) are independent 
of f in the i.i.d. case, we introduce the following values for 
treatment of the dependence of e: 

R(e,a\p) d = inf {lim^log^-|lime(T>„) < e}, 

R\e,a\p) d = inf {lim^log^-|lime($ n ) < e}, 

R l V,a\p) d = inf {lim^log^-|lime(T>„) < e}, 

{$„} yn e na 

for 0 < e < 1 , and 

S(e,a\p) d = sup{lim^log^^|lime(T'„) < e}, 
{>!'„} V n e 

^(e^lp) d = sup{lim^log^^|lime(T'„) < e}, 

{T'„} V n e 

S t {e, a\p ) d = sup {lim log lime(tf n ) < e}, 

{*„} Vn e na 

for 0 < e < 1. While we can define other quantities 
R+(e,a\p), Rt + (e,a\p), R % + (e, a\p), S+(e,a\p), S\(e,a\p), 
and Si j_(e, a|p), we treat only the above values in this section. 
This is because the later values can be treated in a similar 
way. The following theorem holds. 

Theorem 3: 

S{e, a\p) = f? f (l - e, a\p) = i2*(l - e, a\p) = H(e, a\p), 

'S' t (e, «|p) = S t {e,a\p) = R( 1 - e,a\p) = H(e,a\p). 

Especially, in the case of the i.i.d. P n , as is characterized 
in o, these quantities with a = // ( P) depend on e. 

V. Relation to Folklore for source coding 
Next, we apply Theorem[3]to the relation between the code 
$n = (M n . <p n , tpn) and the operation = (_M„, </>„)• 
When 

lime($ n ) = e, lime(^ n ) = e', (23) 

similar to the previous section, we can derive the following 
inequalities: 

K(e,a\p) > H{ 1 - e,a\p), H(e',a\p) > H(1 - e,a\p). 

Thus, if H(e',a\p ) or R_(e',ajp) is continuous with respect 
to e' at least as in the i.i.d. case, the above equation yields 


e' > 1 — e. That is, the following trade-off holds between 
the error probability of compression and the performance of 
intrinsic randomness: 


lim£(<f> n ) + lime(^ r „) > 1. (24) 

Therefore, Folklore for source coding does not hold with 
respect to variational distance. In other word, generating com¬ 
pletely uniform random numbers requires over compression. 
Generally, the following theorem holds. 

Theorem 4: We define the distance from the uniform dis¬ 
tribution as follows: 

S(p n ) = f min d{p n ,pu,s)- (25) 

(Z £ in 

Then the following inequality holds: 

£($„) + £(^n) > 6(p n ). (26) 

Especially, in the i.i.d. case, the quantity 5{p n ) goes to 1. In 
such a case, the trade-off relation 


lim(£(4>„) + £(^ n )) > 1 (27) 


holds. Furthermore, the above trade-off inequality o is rigid 
as is indicated by the following theorem. 

Theorem 5: When the convergence 

lim„_>ooPn{—^ logp ra (w n ) < a + is uniform 

concerning 7 in an enough small neibourhood of 0 and the 
relation 

} = e 


lim lim p n { -logp„(w n ) < a - 

7—>0 n— >00 Ji 


b + 7 

Vn 


holds, there exists a sequence of codes <f> ra = (Ai n , 4>n, VVi) 
(T'„ = (A4 n ,(p n )) satisfying the following conditions: 


lim£(4)„) < 1 — e, lim£(\I/ n ) < e, (28) 

lim —log = b. (29) 

\fn e na 


VI. Intrinsic randomness based on KL-divergence 

CRITERION 

A. First order asymptotics 

Next, we discuss intrinsic randomness based on KL- 
divergence. Since Han [1] discussed Folklore for source coding 
based on KL-divergence criterion, we need this type of discus¬ 
sion for comparing our result and Han’s result. The first work 
on intrinsic randomness based on KL-divergence was done by 
Vembu & Verdu[ll], They focused on the normalized KL- 
divergence: 

-D{p n o(t)- 1 \\pu,MV)^ (30) 

n 

where D(p\\q) is the KL-divergence '^2 ul p{to) log Han 
[ 1 ] called the sequence of distributions p„ o ‘almost’ 
uniform random if the above value goes to 0 . 

Proposition 1: Vembu & Verdu[ll, Theorem 1] 

S*{p) d = sup{lim-log|^„||lim-D(p„o 0 - 1 ||p (7 _ M ) = 0} 
\i/ n n n 

=K(p) d = sup{a| limp„{—— logp„(w) < a} = 0}. (31) 

a n 
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In a thorough discussion of the above proposition, Han [1] 
worked out the following proposition concerning Folklore for 
source coding. 

Proposition 2: Han[l, Theorem 31] The following three 
conditions for the sequence p = {p n } are equivalent: 

• When a sequence of codes 4> n = (M„, <j>n,if , n) satisfies 
£(<!>„) —► 0, 2 log | j\4 n —> H(p) then the value (13(71 
goes to 0. 

• There exists a sequence of codes = (A4 n ,<f> n ,il>n) 
satisfying e (<!>„) —> 0, 2 log A4 n —> H(p) and the value 
69 goes to 0 . 

• The sequence p = { p n } satisfies the strong converse 
property: 

H(p) = H(p) d = inf{a| limp n {—— logp ra (w) < a} = 1}. 
a n 

(32) 

In order to discuss Folklore for source coding in KL- 
divergence criterion, we need to generalize Vembu & Verdu’s 
theorem as follows. 

Theorem 6: Assume that H(e\p) = H_(e\p). We define the 
probability distribution function F by 

/ F( dx) = e. 

Jo 

Then, the inequality 


and define the following quantities: 

sim 

def i _ 

= sup {lim — log |4'„|| limD(pj/,Xn \\Pn ° </>„ *) < <5}, 

{$„} n 

s;m 

d £ \ \ 

= sup {lim — log I'l'nll lim —D{pu,M n \\Pn ° ^n 1 ) < 

Then, they are characterized as follows: 

Theorem 7: 

S* 1 (6\p)=H(l-e~ s \p). (35) 

If the limit 

<r(a) = f lim —logp„{—logp„(w) > a} 
n n 

converges, the relation 

S^lp) = sup{a - a(a)\a(a) < <5} (36) 

a 

holds for \/5 > 0 . 

Remark 4: Indeed, Han [4] proved a similar relation con¬ 
cerning the fixed-length source coding with the constraint for 
error exponent: 

inf {lim — log |<f>„|| lim —- loge(4’ ra ) > r} 

{$„} n n 


lim -D(p n o 1 1| Pu,Mn) > 

n 



x)F( dx) 


(33) 


= sup{a — <z(a)|o;(a) < r}, 

a 

where 


(37) 


holds, where a = lim 2-log M n . Furthermore, when 77(1 — 
e\p) = a, there exists a sequence of codes {*!>„} attaining 
the equality of 63 and satisfying lime(<l> n ) = e. Here, we 
remark that the inequality 63 is equivalent to the inequality: 


a;(a) = lim — log p n { —log p n {u>) > a}, 
n n 

Moreover, Nagaoka and Hayashi [10] proved that equation 
63 holds when we define er(a) by 


1 7 a 

lim —H(p n o 4)- 1 ) < / xF( dx) + a(l — F(a)). (34) 

n Jo 

Note that the following equation follows from the above 
theorem: 


s*m 

= f sup {lim — log l^nll lim —D(p n o (jx^Wpu^Mn) < <5} 

|$ n j Tl Tl 


= sup 

a 



x)F( dx) < 5 


Remark 3: The characterization S'*(<5|p) as a function of 6 
was treated as an open problem in Han’s textbook [4]. 

In the i.i.d. case of probability distribution P, since 



x)F( dx) 


we obtain 


a — 77 (P) a>77(P) 

0 a < 77(P), 


S*(5\P) = H(P) + S. 


Next, we focus on the opposite criterion: 


D(PU,M n \\Pn° fin 1 ), 


a(a) = lim — logp„{—log p n (ix) > a}. (38) 

n n 

Hence, when the limit a (a) exists, equation 63 holds with 
replacing o(a) by 63- 

Further, Hayashi [12] showed that when the limit a(a) 
exists, sup a {a — er(a)|<r(a) < r} is equal to the bound of gen¬ 
eration rate of maximally entangled state with the exponential 
constraint for success probability with the correspondence of 
each probability to the square of the Schmidt coefficient. 

In the i.i.d. case of P, these quantities are calculated as 


SUS\P) = H(P), 


So(S\P) = min 

2 ' 0 <s<l 


sS + i/j{s) 
1 — s 


ip(s) = f iogy^ p(w) s , 

UJ 

(39) 


where we use the known value of the left hand side of < !37b . 
in the calculation d39b . 

Remark 5: As is discussed in Theorem 3 of Hayashi [12], 
when the limit tft(s) := lim„ 2 - log^ w Pn( w ) s an d i ts fi rst ar, d 
second derivatives i/j ( s ) and i): (s) exist for s £ ( 0 , 1 ), the 
relation 


S* 2 (S\P) 


sS + il>(s) 

mm - 

0 <s<l 1 — S 


(40) 
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holds. 

From the above discussion, we find that changing the order 
of input distributions of KL-divergence causes a completely 
different asymptotic behavior. 

B. Second order asymptotics 

Similar to the variational distance criterion, in order to more 
deeply discuss Folklore for source coding, we need to treat the 
second order asymptotics. For this purpose, we focus on the 
following values: 


Thus, if a sequence of codes <£„ = (A4 n , <j> n ,if> n ) satisfies that 
e($ ra ) —> 0, it does not satisfy 


nor 


A=D(p n o (jjJ-W pu,mJ -»■ 0 
\ n 


D(pu,Mn\\Pn° K 1 ) -» 0. 


(44) 


(45) 


S* (6, a\p) 

def r,. 1 , I'I'n 

= sup {hm -=. log — 

Jn e nc 


I Hm —j=D{p n o <f > n 1 \\P U , M n) < <5}, 
Jn 


I Hm D(pu,M n \\Pn ° ^n, 1 ) < 


{*4 

St(S,a\p) 

def y,. 1 , l^n 

= sup {hm —j= log —— 

{»„} Vn e nc - 

Concerning the first value, the following theorem holds. 

Theorem 8: Assume that the condition 03 and the equa¬ 
tion H(e,H_(p)\p) = H_(e, H_(p)\p) hold. Define the probabil¬ 
ity distribution function F by 

rH(e,Hyp)\p) 

/ F( dx) = e. 

J o 

Then, the inequality 

Ytm-^=D{P n o 4>n l \\PuMn) > [ ( b-x)F(dx ) (41) 

y/n J -oo 

holds, where b = lim log -pinrfej ■ Furthermore, when 
H(j — e. H(v)\v) = b , there exists a sequence of codes 
{<!>„} attaining the equality (14 1 1 and satisfying lime(<l> n ) = e. 
Finally, we remark that the inequality (ED is equivalent to the 
inequality: 

lim —7=(-£f(Pn ° fin 1 ) — < f xF(dx) + 6(1 - F(b)). 

V n Jo 

(42) 

Therefore, we obtain 


Therefore, even if we focus on KL-divergence, if we adopt 
the criterion E3 or <451 . Folklore of source coding does not 
hold. 

Furthermore, combining Theorem 0 we obtain the follow¬ 
ing corollary. 

Corollary 3: Assume the same assumption as Theorem [8] 
If the function e i—> fT(e, JL(p)\p) is continuous, then 


inf <! lim —=D(p n o ^ 1 \\p UMn ) 
{$n} l Vn 


lime($ n ) < e 


<inf {<S|S*(J, JT(p)|jJ) > RHe,H(p)\p)} 

d 


rF-\ 1-e) 


(F 1 (1 — e) — x)F(dx), 

i- 1 \ 


(46) 


mf {lim D(p UMn \\p n o (j)^) | lim e($„) < e} 
<inf {6\Sl(6,H(p)\p) > R}(e, H(p)\p)} 

d 

= - loge. 

In the i.i.d. case, the r. h. s. of 63 equals 


J — oo v 2 n 

Finally, we compare the topologies defined by the following 
limits: 


S*{S,H(P)\p) = sup < b 


[ (b - x)F{ dx) < 5 1. 
J — OO J 


Concerning the opposite criterion, the following theorem 
holds. 

Theorem 9: 

S*{S,a\p) = H(l - e~ s ,a\p). (43) 

In the i.i.d. case of P, these quantities are simplified to 


The relations 


S*(S,H(P)\P) = sup <j b 


— OO 


e~ x2 / 2 dx < S 


d(Pn ° - 

0 

(47) 

-D(p n ° (ji^WPUMn) - 

- 0 

(48) 

—j=D(p n 0 <^“ 1 || Pu,M n ) - 
s/n 

+ 0 

(49) 

D (Pn 0 (K^WPUMn) ~ 

-> 0 

(50) 

D{PU,Mn\\Pn ° C 1 ) 

• 0. 

(51) 

El => E3 => 63- 


(52) 

El 63 => 63- 


(53) 

ED => 63 


(54) 


S* 1 (6,H(P)\P) = y/v^^- 1 (l-e~ s ). 

Especially, when we take the limit <5 —> 0, the relations 

S*(5,H(P)[P) -> -oo, SJ(<S,ff(P)|P) -> -oo 

hold. On the other hand, Theorem E] guarantees that 
a|p) = H_(l — e, a\p), and lim e _ 0 H.(l — e, a\P) = +00. 


hold. The relation E3 is trivial, the first relation of 153> and 
the relation El is trivial from Pinsker’s inequality. For the 
second one of El, see Appendix. 

That is, El gives the weakest topology among the above 
topologies. Thus, there is no contradiction, even if Folklore for 
source coding holds in ( 1481 . but does not hold in 63, El). 
or (1471 . 











VII. Markovian case 


Now, we proceed to the Markovian case with irreducible 
transition matrix Q t . % , where i indicates the input signal and 
j does the output signal. When the initial distribution is the 
stationary distribution Pi, which is the eigen vector of Qji 
with eigen value 1, the average H n (Q ) of the normalized 
likelihood can be calculated as 


H n (Q) 

E»!,... ,i n log Qin.in —1 ' ' ' Q^ 2 ,ill’ll 
n 

^ Pin — lQin,in — l ^^Qin.in—1 "F * * * 

n ' 

2 n _l,2 n 

4 ^ , PiiQi 2 ,ii log Qi 2 ,u 4 ^ ' Pj-i log Pi^ 

*1,*2 U 

— ^ ' Pi log Pi ^ PiQj,i log Qj i 

1 J,* 

:=-53^0j,ilogQj,i. 


The limit of the third cumulant is calculated as 


Eji,. 

/ — log Qi n ,i n _! ■ 

■■ Qi 2 ,h p ii ~nH n (Q)\ 3 


^ J 

=E il „ 

f X {ini in— l) H"~ 

• • ■ + X(i 2 , i 1 ) + Y ( i \) \ 


7 n J 

=Ej li . 

..,i n /— ( X {ini in— 1 

riyjn V 

) 3 H- + X(i 2 ,ii) 3 + Y(ii) 3 


4 3(A (i n . i n —i) X(in— l ) in— 2 ) T * * * 

+ -X"(*3) *2) 2 ^(*2; * 1 ) + X(i 2 , *l) 2 ^(*l)) 


4 3(A(i n , * n _i)A i n — 2 )^ + ■ ■ • 

+ X(i 3) t 2 )X(* 2 , 4) 2 + Xfe, ti)F(ii)) 2 
4 2(A(^ n ,i n _ 3 )A(i n _i, 2 )A (i n — 2 , t n _ 3 ) -f 

+ A(*4, * 3 )X(i 3 , *2)A(i2, «i) + A(j 3 , «2)A(*2, ii)V 

-> 0 . 

Similarly, for n > 3, the n-th cumulant goes to 0 because the 
numerator is linear for n while the denominator is a higher 
term for n. Thus, the limit distribution of the normalized 
likelihood is equal to the normal distribution with average 
H(Q) and the variance V(Q). Hence, concerning the first 
order asymptotics, we have 

H(0\Q) = H + (1\Q) = H(Q), (55) 


where is the expectation concerning the distribution 

Qi n ,i n —1 * * * Q12 - i: i ( 1 ■ 

In oder to treat the limit distribution of the normalized 
likelihood, we calculate the second cumulant as 


Ei 


=E. 


'll 


log Qi n ,in—i * * * Qi2.i1 Pii TlHn(Q') 


X(i n , i n ~ 1 ) + • • • + X(i 2 , *i) + Y(ii) 


—Eij i„ — (x(i n , i n ~ 1) 2 + • • • + X(i 2 , *i) 2 

n \ 

4 Y(ii) 2 + 2X(i n , i n -i)X(i n -i, in- 2 ) 4 
+ 2A(i 3 , i 2 )X(i 2 , ii) + 2A(* 2 , U)Y(n)) 
-+V(Q), 


where Q = {Q n } and Q” nj M = Q in , in _ x ■ ■ ■ Q^^P^. 
concerning the second order asymptotics, we have 

H(e, H(Q)\Q) = H(e, H(Q)\Q) = (56) 


Next, we consider the case the initial distribution is the 
different from the stationary distribution Pj. In this case, the 
distribution of the n-th data exponentially approaches to the 
stationary distribution Pi [13]. Hence, the limit distribution of 
the normalized likelihood is equal to the normal distribution 
with average H(Q ) and the variance V(Q). Therefore, Folk¬ 
lore for source coding does not hold for the topology < 1491 , 
dH), or d47} in the Markovian case as in the i.i.d. case. 

Further, by using Remark 0 S 2 (S\P) is calculated as 
follows. In the Markovian case, ij)(s) = log iP a -,i, 

where the distribution P s; j consists of eigen vectors of the 
matrix Qj , (Section 3 of Dembo & Zeitouni [13]). Hence, we 
obtain 


S* 2 (S\P) 


s6 + ip(s) 

mm - 

0 <s<l 1 — S 


(57) 


where X(i k+1 ,i k ) := - log Qi k+1 ,i k ~H(Q), Y(ii) := Pi 
H(P), and 

V(Q) 

: =^2QjiP(~ logQj.i - h(q)) 2 

j,i 

4 2 ^2 Qk,jQj,iPi(- log Qk,j - H(Q))(- log Qjj 


VIII. Universal fixed-length source coding and 

UNIVERSAL INTRINSIC RANDOMNESS 

In this section, we focus only on the independent and 
identical information source. In this case, as was shown by 
Csiszar and Korner[14], there exists a fixed-length source code 
that attains the first order optimal rate and does not depend on 
the probability distribution of the information source while 
we proved the existence of a code attaining the optimal 
bound depending on the distribution. Such a code is called 
//((/)[universal, and is an important topic in information theory. 
Indeed, information spectrum method can apply any sequence 
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of information source, but gives a code depending on this 
information source. In contrast, universal code assumes on the 
independent and identical information source, (or Markovian 
source), but depends only on the coding rate not on the 
information source. As is stated in the following theorem, there 
exists a universal fixed-length source code attaining the second 
order optimal rate. 

Theorem 10: Assume that 0 is a finite number d , then 
there exists a fixed-length source code <I> r , on f l n such that 


lim 


1 l$n| 

y/n e na 


= b 


(58) 


and 


, , ( 0 HiP ) < a 

limepn($„) = | x H{p) = a (59) 

The error probability of the universal fixed-length source code 
had not been discussed when the rate equaled the entropy of 
the information source. But, this theorem clarifies asymptotic 
behavior of the error probability in such a special case by 
treating the second order asymptotics. 

Concerning intrinsic randomness, while Oohama and Sug- 
ano [15] proved that there exists an operation universally 
attaining the first order optimal rate, we can also prove the 
existence of a universal operation achieving the second order 
optimal rate. 

Theorem 11: Assume that O is a finite number d, then 
there exists an operation on 11 " such that 


lim 


i jjy 

yfn e"° 


= b 


(60) 


and 


limepn (T'n) 


0 H(P) > a 
H(P) = a. 


(61) 


IX. Proof of theorems 

First, we give proofs of Theorems [[] and |3 which are 
partially known. Following these proofs, we give our proof 
of Theorem 0 which is the main result of this paper. This is 
because the former are preliminaries to our proof of Theorem 
0 After these proofs, we give proofs of Theorems 00 


A. Proof of Theorem 0 

Lemma 1: Han [4, Lemma 1.3.1] For any integer M n , there 
exists a code satisfying 

1 - £■($„) > Pn{Pn(co) > |$n| < M n . (62) 

ivi n 

Lemma 2: Han [4, Lemma 1.3.2] Any integer M' n and any 
code <l»„ satisfy the following condition: 

1 - e($n) < Pn{Pn{u) > ^. 

By using these lemmas and the following expressions of the 
quantities R( 1 — e|p),f?l(l — e| p) and R$( 1 — e\p), we will 


prove Theorem 0 

R{l-e\p) = inf {lim - log |$„|| lim 1 - £($„) > e}, 
{$„} n 

R?(1 — e| p) = inf {lim — log |$„|| lim 1 — £($„) > e}, 
{$„} n 

R^( 1 — el p) = inf {limi log |$„|| liml — £($„) > e}. 
{$«} n 

Proof of direct part: For any real number a > H(e\p ), by 
applying Lemma [0 to the case of M n = e"“, we can show 

liin Pn{Pn{u) > = limpn{--logp ra (w) < a} > e, 

M n n 

(63) 

which implies that a > R(1 — e\p). Thus, we obtain 
H(e\p) > R( 1 - e| p). 

By replacing the limit lim in m by lim, we can show 
H(e\p) > f? t ( 1 - e\p). 

Finally, by choosing M n satisfying 

limp„{-- logp n (w) < - logM„} > e 
n n 

lim — log M n = a > H_{e\p), 
n 

we can prove 

K(e\p) > R l { 1 - e\p). 

The direct part of 0 and 0 can be proved by replacing > e 
by > e in the above proof. ■ 

Proof of converse part: First, we prove 

H(e\p) < R( 1 - e\p). (64) 

Assume that a f = lim i log |& n |, lim 1 — £($„) > e. For 
any real number S > 0, we apply Lemma |2] to the case of 
M' n = e n ( a + s )_ Then, we obtain 

Pn{-^ logp n (uj) < a + d} > 1 - £($„) - ■ (65) 

Taking the limit lim , we can show 

lim.Pri{-log pnipj) < a + 5} > e. 

n 

From this relation, we obtain a + 6 > H(e\p), which implies 

(J64J- _ 

Similarly, taking the limit lim at < l65l . we can prove 
H(e\p) < R\ 1 - e\p). 

dcf 

Finally, we focus on a subsequence rik satisfying a = 
lim i log |T>„| = lim fe ^ log |$„ fe |. By using <65j, we obtain 

limp n {--logp„(a;) < a - 5} 
n 

<limp„ fe {-log Pn k {u) < a-5} < lim 1 - £($„,). 

k Tlk fc 

Taking account into the above discussions, we can prove 

K{e\p) < R$(l - e\p). 

Similarly, the converse part of 0 and 0 can be proved by 
replacing > e by > e in the above proof. ■ 
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B. Proof of Theorem |3 

Lemma 3: Han [4, Lemma 2.1.1] For any integers M' n and 
M n , there exists an operation 'L„ = (A4 n , <p n ) satisfying 

£(^n) <Pn{Pn{u) > T77} + TJ7, |^n| = M n . (66) 

IK£ n 1 V 1 n 

Lemma 4: Han [4, Lemma 2.1.2] Any integer M' n and any 
operation 'l/ rl satisfy 

e(®») > > jL) - i^r. 

By using these lemmas, we prove Theorem 0 
Proof of direct part: For any real numbers a < H(e\p) 
and S > 0, we apply Lemma 0 to the case of M n = 

e n(a-S) ^ _ e na as f 0 ]J 0ws; 

lim p n {p n (uj) > —} 

= limp„{-log p n (tu) < a} < e. (67) 

n 

Since jjr —> 0, we obtain lim£(T' n ) < e, which implies that 
a — S < S(e\p). Thus, the inequality 

K(e\p) < S{e\p) 

holds. Moreover, by replacing the limit in d67l by lim , we can 
prove 

H(e\p) < 5 f (e\p) 


Moreover, by focusing on a subsequence n fc satisfying a = 
lim i log 14/ n | = lim/. log |\& nfc |, we can show the follow¬ 
ing relations from < 1691 : 

logPnM < a - 6} 

n 

< lim p nk {-logp„ fc (w) < a - <5} < lime(\D„J, 

k Uk fc 

which implies that 

H(e\p) > S*(e|p). 


Similarly, the converse part of o can be proved by replacing 
< e by < e in the above proof. ■ 


C. Proof of Theorem 0 

For any real number b > I He. a\p), by applying Lemma [2 
to the case of M n = e ” a +v” b , we can show 


lim p n {pn(u) > —} 

= limp n {—-logp„(w) < a+ ~^=} > e, 
n Jn 

which implies b > R( 1 — e,a\p). Thus, we obtain 
H(e,a\p) > R{ 1 - e,a\p). 

Similarly to Proof of Theorem [T| we can show 

H_(e, a\p) > R){ 1 — e, a\p), H_(e, a\p) > R} (1 — e, < 


Finally, by choosing M' n satisfying 

lim p n { -log p n (ui) < - log M' n } < e 

n n 

lim — log M' n = a < H(e\p), 
we can prove 

H(e\p) < S t (e\p). 


Next, we prove 

H{e,a\p) > R{ 1 - e,a\p). 


(70) 


Assume that b = f lim i log lim 1 — e( < f> n ) > e. For any 
real number <5 > 0, we apply Lemma |2] to the case of M' n = 
e na+s/n(b+&) _ Then, we obtain 


Pn{-~ logPn(w) < a + > 1 - e($„) 

n Jn 


jjM 

gna+y/n{b-\-5) 


The direct part of (1171 can be proved by replacing < e by < e 
in the above proof. ■ 

Proof of converse part: First, we prove 


Taking the limit lim , we obtain 

v r 1 . , . 6 + (5, 

hmp„{-log Pn{u>) <al-> e, 

n Jn 


K{e\p) > S(e|p). (68) 

Assume that a c = lim ^ log lime(T'„) < e. For any 
real number <5 > 0, we apply Lemma |4] to the case of M' n = 

e n{a-S) Then, we obtain 

1 e n(a-6) 

Pn{ -logPnM < a ~ J} < e(^ ra ) + ~j • (69) 

n |W„| 

Taking the limit lim, we can show that 

limp n {-log p n {u>) < a - 5} < t. 

n 

Thus, we obtain a ~ 5 < fT(e|p), which implies < 1681 . 

Similarly, by taking the limit lim at the inequality J69l> . we 
obtain 


which implies b + S > H(e,a\p). Thus, we obtain d70b 
Therefore, similar to our proof of Theorem [2 we can show 

H_(e, a\p) > R){ 1 — e, a\p), fT(e, a\p) > R^ (1 — e, a\p). 


Next, we prove 


K{t,a\p) < S{e\p). 


(71) 


For any real numbers b < H(e. a\v ) and S > 0, we apply 
Lemma 0 to the case of M n = e ™+J™(b-S), = e na+^tb_ 
Since 


lim p n {p n (uj) > — } 

^n 

= limp n {--logp„(w) < a+ -^=} <e 
n Wn 


and jgh —> 0, we obtain lime(T'„) < e which implies a — 6< 
£(e|p). Thus, we obtain d7 1 1 . 


H(e\p) > S’ t (e|p). 
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Similar to our proof of Theorem 0 we can show 

H(e,a\p) < S' 1 (e, a\p), H(e,a\p) < S t (e,a\p). 

Finally, we prove 

H(e,a\p) > S(e\p). (72) 

Assume that b == lim f log and lime(\& n ) < e. For any 
real number <5 > 0, we apply Lemma @] to the case of M' n = 
e na+s/n(b-8) Then, the inequality 

1 b — S e na+^n(b-8) 

p n { -log p n (u) < a + —=r} < e(Q n ) H-—-- 

n y/n |w n | 

holds. Taking the limit lim, we obtain 

limp n {—— \ogp n (u) < a + } < e, 

n y/n 

which implies b — S< H_(e , a\p). Thus, the relation (1721 holds. 
Similar to our proof of Theorem [2] the inequalities 

H{e,a\p) > 5 t (e,a|p), iT(e,a|p) > S t (e,a\p) 

are proved. ■ 


D. Proof of Theorem 0 

We define the subset M.' n of A4 n as 

M' n =‘ {* G M n \lpn{i) G 

Since the relation (T = 0 holds for any distinct 

integers i,j, the map i/t n is injective on A4' n . Thus, p n can 
be regarded as a probability distribution on M’ n U (f l n \ 
C Mn u (fin \ i/)„(M' n )). Similarly, p n o f~ l 
also can be regarded as a probability distribution on A4 n C 
M n U(Q n \^ n (M' n )). 

Then, the relation 

d(Pn,PUM'J ^ d (PmPU,Mn) 

holds. The definition of 6(p n ) guarantees that 
S(p n ) < d(p n ,p U}M ' n ). 

The axiom of distance yields that 

d(Pm PU,M n ) < d(p ni p n O (j)- 1 ) + d(p n O <j)~ 1 , p Ut M n ). 
Furthermore, the quantity e(<b n ) has another expression: 

£($„) = p n (fl n \ 1pn(M' n )). 

Since the set (f l n \ , f n (fA' n )) coincides with the set of the 
element of A4 n U (fl n \ if> n (A4' n )) such that the probability 
p n is greater than the probability p n o ff 1 . the equation 

d(p n ,Pn O (jif 1 ) = p n (fl n \lp n (M' n )) = e($n) 


holds. 

Combining the above relations, we obtain 

S(p n ) < £($„) +e(4'„). 


E. Proof of Theorem 0 

First, we construct a sequence of codes = (M. n , fmf’n) 
satisfying (1281 and d29l as follows. We assume that 
S n (a,b) d = {-ilogp„(w) < a + ^}, M n d = \S n (a,b)\ 

~ ^ e £ 

and denote the one-to-one map from S n (a,b) to Ai n = 
{1 ,...,M n } by </>„. 

Then, the inequality M n < e na +V™ b holds. Next, we define 

def 

e n = Pn(S n (a,b)) and focus on the probability distribution 
p n (uj) '== P x _e °n S n (a,b) c . Then, we apply Lemma [3] to 
the case of M' n = M' n d = (1 - e„)e no+ v'"( 6+27 "), Mn = 
M n = f (1 — e n )e na+v/ " ( ' b+7 "^, 7 n = 1/n 1 / 4 , and denote the 
transformation satisfying the condition of Lemma □ by f n , 
where the range of f n is {M n + 1,..., M n + M n }. Half of 
the variational distance between p n of- 1 and the uniform 
distribution is less than 

p n {--\ogp n (u) < a+ b + 3- 7 " } + e"^ 7 ". (73) 

n v n 

Next, we define a code = (M n ,f n ,ip n ) with the size 
M n = e na +v / "( b + 7 n) as follows. The encoding f n is defined 
by (j> n and <f> n . The decoding ip n is debited as the inverse map 
on the subset A4 n of A4 n , and is debited as an arbitrary map 
on the compliment set Adjj. Since 

1 -e($n) > Pn(S n (a,b)) = e n , (74) 

we obtain the brst inequality of ( 1281 . 

Since the variational distance equals the sum of that on the 
range of f n and that on the compliment set of the range, e (T „) 
can be evaluated as follows: 

e(*n) 

<(! - e n ) (pn{-~ logp n (uj) < a + b + p n } + e-^A 
+ Pn(S n (a, b)) 

=(1 - e„)e _v/ " 7 " + p n (S n (a, b)) 

+ Pn(S n (a , b) c n {-- logp„(w) < a + & + 3- 7 " }) 
n yfn 

=(1 - e n )e _v/ " 7 " + p n {-~ log p n (v) <a + & + E 7 " } 

n y/n 

where we use the relation S n (a,b) C {—— logp„(u;) < a + 
b ~ l ^^2 T ‘ }. Since the debnition of M n guarantees the condition 
J29I) . the proof is completed. ■ 

F. Proof of Theorem [ 6 ] 

Proof of inequality 

Lemma 5: The following relation holds for any operation 

(•^n; 0n)* 

HiPnO^n 1 ) 

<H(M n ,p n ) 

+ Pn{Pn(u) < -T}(l0gM„ - log p n {Pn(u) < Jf}), 
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where 

H(M n ,p n ) d = - ^2 Pn(v) logp n (uj). (75) 

P»(“) 

Proof: Define the set M' n and the map (j/ TI from O r , to 
A4'„ as follows: 


M' n =‘ M n U {pn(w) > 


= 


det f </>n(w) PnM < 




Vn{u) > 


M p 


(76) 

(77) 


Since 


M n 


-^jPnOf'n (*) logPn ° $ n 1 (*) 

2=1 

<p„{p„(w) < -^-}(logM n - l 0 gp n {p„(w) < 

the inequality 

HiPnOfn -1 ) 

<H(M n ,p n ) 

+ p n {p n (w) < — }(logM„ - logp n {p„(u;) < —}) 

holds. When the map </>" from A4' n to A4 n is defined by 

,/// \ def J U) LO £ A^n 

0n(W) - j ^ n ( w ) w g {p ra ( W ) > -A-}, 

the relation (f> n = </>" o ft n holds. Thus, cf)- 1 = <j>' n 1 o </>" 1 . 
Generally, any map / and any distribution Q satisfies 

tf(<3°/ _1 ) = -E E Q( x ) l °z( E 

V x:y=f(x) x':y=f(x’) 

*-E E Q(x) logQ(x) = H(Q). 

V x:y=f (x) 

Hence, 


H{p n of n ) =H(p n o(/) l n o </>" )<H{p n o<f/ n ). 


Therefore, the proof is completed. ■ 

We define the probability distribution function F n on the real 
numbers R as: 

A„(x) d = p„{--logp n (w) < x) (78) 

n 

for a probability distribution p n . Then, the relation 

l r f lo s M « 

—H(M n ,p n ) = / xF„(dx) (79) 

« Jo 

holds. Thus, Lemma 0 yields the inequality 

-iT(p n o f- 1 ) 
n 

P ^ log M n 

< / xF n (dx) 

Jo 

+ ~Pn{Pn(u) < —}(l 0 g M n - \ogp n {p n (u) < ~tT~t })■ 
n M n M n 

Taking the limit, we obtain (El. ■ 

Proof of the existence part: 


Lemma 6: Han[4, Equation (2.2.4)] For any integers M n 
and M' n , there exists an operation T „ such that 

D{p n o ip~ 1 \\p UMn ) 

£logM "(^: + jk +p ”{ Mul)> ik})’ 

|*n| =M n , 

Remark 6: Han [4] derived the above inequality in his proof 
of Proposition [3 

In the following, by using Lemma |3 we construct the 
code $„ = (Mmfmipn) satisfying the equality of EJ 
and lim £(<!>„.) = e as follows. Assume that S n (a) = 
{-^log Pn{ui) < a}, M n = |S„(a)| and let (j> n be the 
one-to-one map from S n (a) to M. n *=* {1,..., M n }. Then, 
we can prove that M n < e na . Moreover, we let be a 
map satisfying the condition of Lemma |3 for the probability 
distribution p n {to) = f £^1 on the set S n (a) c in the case of 

M n = M n = (1 — e n )e raa and M' n = sf M n , where e„ = 
Pn{S n (a)) and the domain of <j> n is {M n +1,..., M n + M n }. 
Thus, 


D (Pn ° ^WPumJ 

< log((l - e n )e na )(p n {--logp n {uj) < a} + -J==)- 

71 \/Mn 

Since any element of S n (a) c does not satisfy the condition 
— i logp n (u;) < a, the inequality 

H(p n o fn 1 ) > na + log(l - e n ) - -== - 

VMn 

holds. 

We define the code = (A4 n , (j> n: tp n ) with the size M n = 
M n + M n as follows: The encoding <p n is defined from f n 
and cf) n . The decoding ip n on the subset Ai n of A4„ is the 
inverse map of f. Then, we evaluate H ( p n o <f>~ 1 ) as 


H(p n off 1 ) 

=H(e na ,p n ) + (1 - e n )(H(p n o ff 1 ) - log(l - e„)) 
>H(e na ,p n ) 


+ (1 - e n ) na - 


2(na + log(l - e n )) 

\fMn 


r 

=n / xF n { dx) + na{ 1 — F n (a)) 

Jo 

_ 2(1 - e n ){na + log(l - e n )) 

\JWn 


Dividing both sides by n and taking the limit, we obtain 
the opposite inequality of E), which implies the inequality 
of ED- Similar to Theorem |3 we can prove that this code 
satisfies the condition lim £(<!?„) = e. ■ 


G. Proof of (EJ in Theorem [7| 

Proof of direct part: For any real numbers e > 0 and a 
satisfying 

a <H(l-e~ & \p), (80) 
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we construct a sequence ^ n = (A4 n , (f> n ) such that 

\im D(pjj,Mn \\Pn ° ^n 1 ) <$ 

lim — log | 'I'n | = a — e. 
n 

We define the probability distribution p n (pj) = f p (g^a) 0 ) on 

S n (a) c d = {^log p n (w) > a} (S n (a) d = {^log p n (w) < 
a}). Since p n {p n {u) > p J s ^( a y) i = Pn{p n (u) > e~ na } 
and Sn(a) c D {p n (uj) > e~ na } = 0, there exists a map from 
S n (a) c to {1 = f e n( ' a ~ e ^p n (S n (a) c )} such that the 

minimum probability of the distribution p n o ff 1 is greater 
than 

i p -na 1 

_ - _ = (l _ e ~ ne ) 

M n Pn{Sn{a) c )) M n y 

Hence, we obtain 


Information processing inequality of KL-divergence guaran¬ 
tees that 


D(pu,Mn\\Pn° ^n 1 ) 


>- 


K 

|$J 

+ 


i - 


M' 

los T*fi _lose ’ 

ML 


1 $. 


log 1 - 


K 

l$J 


- log(l - e„) 


Since M' n < e n ^ R ~^ and {84ji. 
ML 


l<F, 


0 , 


ML , ML 

log- 


0 . 


, ... |$n| |$n, 

Therefore, taking the limit lim, we have 

S > lim D(p UjMn || p n o ff 1 ) > lim - log(l - e n ) 
= - log(l - lim e n ), 


°(Pu,Mn IIPn ° k X ) < - log - 5 - (1 - e "*) + log 4- 

lVln M n 

= - log (1 - e~ ne ) -* 0 . (81) 

Next, we define a map f n from Ll n to Ai n = 

11,..., M n , Mn T 1} by (j) n | gc ( a ) = (fan and fn (Sn (u)) = 

M n + 1. Then, 


D(pu,Mn\\Pn ° <t>n ) 

log(M„ + l) 


M n 


M n 


1 


M n + 1 


D (Pu,mJ\P n°<t>n ) 


T log 


M n 

Mn + 1 


logp n (5„(a)°) . (82) 


which implies 

lime n < 1 — e~ s . 

Thus, inequality yields 

lim p n { —- log (w) <l-e 

n 

Therefore, 

R-e' <H(l-e~ 5 \p). 

Since e' is arbitrary, we obtain 

Stm)<H{l~e~ & \p). 


Since 


lim p n (S n (a)) < 1 — e \ (83) 

we have the inequality m that guarantees 

\\mD{pu,M n \\Pn o^n 1 ) 

= lim — logp n (S'„(a) c ) = lim-log(l - p n (S n (a))) < S. 
Moreover, 


lim - log \M n \ = lim - log(M n + 1 ) 
n n 

\ e n(a-e) 

= Km n l0S y(^m =a -^ 


Proof of converse part: Assume that a sequence T,, = 
{M n ,fn) satisfies 


lim-log | | = R 

n 

lim D{p UMn || p n off 1 ) <6. 

For any e' > 0, we define 

^= f |{4-l°gPn°0n 1 (*) <R~d} | 

e n d = Pn° f^i—logPnO ff 1 ^) <R 

n 

>Pn{ -log p n (u)) <R-e}. 

n 


(84) 


e'} 


H. Proof of H361 in Theorem 0 

First, by using the following two lemmas, we will prove 

Lemma 7: When three sequences of positive numbers a n , 
b n , and c n satisfy 

O'n f. b n T Cm 

then 

lim — log a n < maxjlim — log b n , lim — log c n } ■ 
Lemma l U U 

sup{a — cr(a)|cr(a) < (5} > sup{£(a)|a — £(a) < <5}, ( 86 ) 

a a 

where £(a) is defined as: 

£(a) d = lim-log |{—-logp„(o;) < a}|. 

Proof of direct part: ^Ve will prove 

S 2 ($\p) > sup{a — <j(a)|<r(a) < 5}. 

a 

That is, for any real numbers e > 0 and a satisfying a (a) < S, 
we construct a sequence T' n = (A4 n , f n ) such that 

lim —D(pu,M n ||Pn o 1 ) < 8 
n 

lim — log I 'Tn I = a — a (a) — e. 
n 


( 85 ) 
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Similar to the proof of @ 5 ) , we define p n (uj), S n (a) c , S n (a ) 
and f n . 

Using EJ and El, we have 

lim \\p n o ff 1 ) 

n 

= lim — logp„(£„(a) c ) = a (a) < S. 
n 

Moreover, 

lim — log \M n \ = lim — log (M n + 1) 
n n 

1 e n(a-e) 

Proof of converse part: We will prove 

(<$|p) < sup{a — a(a)\a(a) < <5}. 

a 

That is, for any sequence \D„ = (A i n ><j> n ) 

lim ^D(jpu,M„\\Pn ° 4*n 1 ) < 5, we will prove that 

H f 1 

R = hm — log |Af„| < sup{a — cr(a)|cr(a) < 5}. 
n a 

Let {r^} be a subsequence such that lim*. A- log |Al„ fc | = 
lim i log | At n |- We choose the real number ao 

dcf —1 

a 0 = inf{a| limpc/,At„{—log Pn k °4 > nl{ i ) < «} > °l- 

k n k 

For any real number eo > 0, the relation 

Imifc Pu,Mn k {fif l°gP" ° fint (*) < a o ~ £ o} = o holds. Since 

n(a o - e o)Pu,Mn k {—“ log-Prifc ° 0 nl(i) > °o - eo} 
nk 


Next, we choose a real number e such that 

0 < e < 6- (a 0 - R). ( 88 ) 

Then, there exits a real number a > 0 such that 


.-1 


UmPu,Mn k {— log Pn k ° (t>n k (*) < ao + e} > a. 


Thus, 


|{— logPn fe ° </>„*(*) <a 0 + e}| > aM n 


for sufficiently large n k - Since 


■ 

Pn | 


(87) 


,L k 

satisfying 

<Pn{ 

— log Pn k (v 

Uk 




we can evaluate 


< 


Thus, 


{ — logp„ fc o f n l(i) < a 0 + e} 

n k 

\<)>n k { —log Pn k (u) <a 0 + e} 

n k 

Pn{f^\ogp nk {u) > a 0 + e} 
g—n(o 0 +e) 


.-1 


< - J2pu,M nk (*) logp„ fc o 


we have 


-1 


n(a 0 - eo)pu,M n „{ —log Pn k ° f nk (i) > a 0 - e 0 } 


n k 

- log M nk 

< - log M nk - J2 PU ’M^ k l °SPn k ° <t>nl{i) 

i 

=D{pu,Mn k \\Pn k °<Pn k )- 

Thus, 

(ao — eo) ~ R 

= lim ( (a 0 - eo)pt/,At ni . { — logPn o ff k (i) > a 0 - e 0 } 
k \ k nk 

- - log M nk ] 
n 


< lim —D(p U}M || p nk o (j, n D < 6. 

K Tlk 

Taking the limit eo —> 0, 

a 0 - R < lim — D(p UM || p nk o <j>~l) < 5. 

K Tlfc 


aM nk < |{— log Pn k o f nk ( i ) < a 0 + e}| 
^Pn{f±\ogp nk (u:)>ao + e} -1 

<- g—n(a 0 +e) - + |{—logP„ fc M < a 0 + e} |. 

Using Lemma 0 we have 

max{£(ao + e), (a 0 + e) - a(a 0 + e)} > R. (89) 

If £(ao + e) > (ao + e) — a(ao + e), by combining JMt and 
< 1891 . we can show 

(a + e) — £(a 0 + e) <6. 

Therefore, we obtain 

R < sup{£(a)|a — £(a) < <5} < sup{a — cr(a)|er(a) < <5}. 

a a 

If £(ao + e) < (ao + e) — cr(ao + e), combining J 88 l and 
(1891 . we can show 

cr(ao + e) <5. 

Therefore, we obtain 

R < supja — <r(a)|<j(a) < 5}. 

a 

■ 

Proof of Lemma [7| Since 

b n T c n f m “Zcnf 
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we have 

1 r log 2 1 log 2 1 

- log a n < max}-h - iog6„,-h - log c n j. 

n n n n n 

Taking the limit lim, we obtain 

lim — log a n < max{ lim — log b n , lim — log c„ }. 
n n n 


Proof of Lemma^ In this proof, the following lemma plays 
an important role. 

Lemma 9: Hayashi[12, Lemma 13] If two decreasing func¬ 
tions / and g satisfy 


-/(a) + a > g{b ) if /(a) > f(b), (90) 


then 


sup{a — g(a)\g(a) < <5} > sup{/(a)|a - f(a) <S}. 

a a 

Remark 7: This lemma is essentially the one obtained by 
Hayashi[12], But, this statement is a little different from 
Hayashi[12]’s. 

Proof: We prove Lemma [9] by reduction to absurdity. 
Assume that there exists a real number ao such that 

ao ~ /(ao) < <5, (91) 

/(ao) > sup{a - g(a)\g(a) < r}. (92) 

a 

We define a\ := inf a {a|/(a) = /(ao)} and assume that ao > 
a\. For any real number e : 0 < e < ao - a,\ , the inequality 
/(ai — e) < /(ai + e) holds. Using J90l . we have 

g(a\ - e) < -/(ai + e) + ai + e = -/(a 0 ) + ai + e 
<5 + (ai — ao) + e < S 

Thus, 

sup{a - g(a)\g(a) < 6} > ax - e ~ g{ai - e) 

a 

>ai -e-(ai+e) + /(ai + e) = /(a 0 ) - 2 e. 

Taking the limit e —> 0, we obtain sup{a — g(a)\g(a) < r} > 
/(ao), which contradicts (1921 . 

Next, we treat the case ao = ai. The inequality /(ao) > 
/(ao — e) holds for Ve > 0. Using d90l . we have g(a o — e) < 
—/(ao) + ao < 6. Thus, 

sup {a - g(a)\g(a) < r} > a 0 - e - g(a 0 - e) 

a 

>a 0 - e - a 0 + /(a 0 ) = -e + /(a 0 ). 

This also contradicts J92> . ■ 

Since 

(P„ - e"“){p n - e nQ < 0} < ( Pn - e™){p„ - e nb < 0}. 

By adding e na to both sides, we have 

Pn{Pn - e"“ < 0} + e raa |{p„ - e no > 0}| 

<Pn{Pn - e nb < 0} + e na \{p n - e nb > 0}|, 
which implies 

\{Pn — e na > 0} | 

<e~ na p n {p n - e nb < 0} + \{p n - e nb > 0}|. 


Thus, Lemma 0 guarantees that 

£(a) < max {—a — <r(&), £(&)}. 

Using this relation, we obtain 

?(a) < —a — a(b) if £(&) < £(a). 

Therefore, by applying Lemma [9] to the case of / = }, g = o, 
we can show (1861 . ■ 


I. Proof of Theorem [S| 

Proof of inequality (E3- We define the probability distribu¬ 
tion function F n on the real numbers R as: 

F n {x) = f Pn{~-logp n (uj) < H(jj) + ~^=} (93) 

n yjn 

for a probability distribution p n . Then, the relation 

H(M n ,p n )= f (a /nx + nH(p))F n (dx ) (94) 

J o 

holds, where b n = f -^=(log M n — nH(p)). Thus, Lemma [3 
yields the inequality 


HipnOfn 1 ) 

< / (y/nx+ nH(p))F n (dx) 
Jo 

+ Pn{Pu M < —} 


x (Vnb n +nH(p) - \ogp n {p n (u>) < —}) 

lV±n 


=y/n / xF n (dx) + nH(p) 
Jo 


+ Pn{Pn(u ) < —}(Vnhn 


log Pn{pn(u>) < Jf})' 


Therefore, the inequality 



(H{p n o(j) n l )^nH<,p)) 


< f xF n ( dx) 

Jo 

+ Pn{Pn(u) < — }{y/nb n ~ logPn{PnM < —}) 


holds. Taking the limit lim . we obtain 63. which is equivalent 

with dm . 

Proof of the existence part: In the following, by using 
Lemma | 6 ] we construct the code <h n = 4>n,f>n) sat¬ 

isfying the equality at 63 and lime($„) = e as follows. Let 
be the one-to-one map from 


S n {H(p),b) d = {--logp„(w) < H(p) + -^=} 
n Jn 


to Mn = {1,..., M n }, where M n = \S n (H(p),b)\. Then, 
the inequality M n < e nH (p)+ b Vn holds. Furthermore, we 
define <j> n as a map satisfying the condition of Lemma [b] 
for the probability distribution p n (u>) '= f 011 set 

S n (H(p), b) c in the case of M n = M n d = (l-e„)e” 5? ® +b ' / " 
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and M' n = \JM n , where e n = f p n (S n (H(p),b)) and the 
domain of fa is {M n + 1,..., M n + M n }. Thus, 

<log((l - e n )e n7T ^ +b ^)- 

1 b 2 

( Pn{ -log p n (ui) < H(p) + —=} + ). 

n Vn x/Mf, 

Because no element of S n (H(p),b) c satisfies the condition 
-faogpfauj) < h (p) + fa’ the inequality 

H(p n o fa 1 ) >log(l - e„) + nH(p) + fanb 

- (nH(j>) + fanb + log(l - e„))- 


Next, we define a map cj> n from Pl n to A4 n = 
{I? * • ■ j M n i M n -f-1} by (j) n | (a,b) 4*n nnd (j) n (S n (a, b)) 

M n + 1. Then, we obtain 


D(.PumAPu ° </> „ 1 
1 


M n + 1 


log(M„ + 1 ) 


M n 


M n + 1 


D (PuMJP n °4 > n 1 ) 


log- 


Mn 


'Mn 


holds. 

We define the code = (_A/f n , 4> n fan) with the size M n = 
M n + M n similar to the proof of Theorem [ 6 ] Then, 

H(p n °4fa) 

=H{e n »®+ b ^,p n ) + (1 - e n ){H{p n o fa 1 ) - log(l - e„)) 
>H(e nlI ®+ b ^,p n ) + (1 - e n ) (nH(j>) + faTrb 

- ( nH(p ) + fanb+ log{ 1 - e„))- 


M n + 1 

Since the inequality ( I95> guarantees 

limp^S^a, b)) < 1 - e~ s , 

we have 


- logp n (S n (a, b) c ) . 


(96) 


\m\D(p UyMn \\p n o fa 1 ) 

= lim-log p n (S n (a,b) c ) = lim-log(l - p n (S n (a } b))) < 6. 
Moreover, 

lim —= log \M n \ = lim -7= log(M„ + 1 ) 

x/n x/n 


= lim —— log 


a n(a—e) 


n p n {S n (a,b) c 


= b-E. 


'Mn 


=fan f xF n (dx) + nH(p) + fanb(l - F n (b)) 

J o 

_ 2(1 - e n )(nH(p) + fanb + log(l - e n )) 

By substracting nFL (p) from both sides, dividing both by fan, 
and taking the limit, we obtain the opposite inequality of 63 . 
which implies the inequality of 63. Similar to Theorem 0 we 
can prove that this code satisfies the condition lim £(< 1 >„) = e. 


J. Proof of Theorem [3 

Proof of direct part: For for any real numbers e > 0 and a 
satisfying 

b <H(l~e~ s ,a\p), (95) 

we construct a sequence Tin = (M. n ,4>n) such that 

\va\D{pu^M n \\Pn o ffa) <S 


Proof of converse part: Assume that a sequence d/ n = 
( Mnfan) satisfies 


lim —!= log = R 

x/n e nn 


0 na 

i-1 \ 


(97) 


lim -L log ^ =b-e. 
\/n e na 


lim D(p UMn I \Pn ° (t>n ) < s - 

For any e' > 0, we define 

M' n d = |{— logPnO^- 1 ^) < a+ — e }| 
n fan 

e n d = p n o (j )- 1 {— logPnCKj)- 1 ^) < a + R 6 } 
n fan 

> Pn{— logPnfa) < a + R e }. (98) 

n xjn 

Information processing inequality of KL-divergence guaran¬ 
tees that 

D(pu,Mn\\Pn ° fafa) 

>T^ iogj^-ioger 

|$n| V |$ra| 

+ | 1 -m) (>o g (i-^|-lo S (i-,.)|. 


probability distribution p n (u) d = p fa s n }^ m Since M' n < e "“+\^(«- £ ) and (j97j. 


We define the 

on S n {a,b) c c = {^-logp n (w) > a+ -fa} (S n (a,b) = f 
{fa- logp n (ca) < a+ -fa})- Then, for any e > 0, similar 
to our proof of in Theorem 0 there exists an operation 
fa from S n (a,b) c to M n = f e 7 lo+v/ ” (&_e) p n (S'„(a, 6 ) c ) such 
that 

D (Pu,MjP n ° ( fc 1 ) - - log(l - e~ e ^) -> 0 . 


M' M' M' 

fafa ^ 0 , fafa log fafa 0 . 


. \®n\ |$n, 

Therefore, taking the limit lim, we have 

6 > \im.D{p UM fap n o (fa 1 ) > lim — log(l - e n ) 
= - log(l - lim e n ), 
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which implies 

lim e n < 1 — e~ s . 

Thus, the inequality (1981 yields 

lim{ —— log p n ( lo ) < R — e'} <l-e 
n 

Therefore, 

R — e < H_(l — e ~ s , a|p). 

Since e' is arbitrary, we obtain 

Sl(S,a\p) < H(l - e~ s ,a\p). 


K. Proof of Theorem M 

This theorem is proved by the type method. Let T n be the 
set of //-th types, i.e., the set of empirical distributions of n 
observations. We denote the set of elements 9” corresponding 
to P by Tp C 9", and define a subset T n (a , b) of the set 9” 
as 

T n (a,b) = Upg^.lynl^gan + bv^Tp. 

Using this notation, we define the encoding ip n from 9" to 
T n (a,b) U {0}: 

J u> if tu G T„(a, b ) 

0 if io^T n (a, b) ■ 

We also define the decoding ip n such that ip n (w) = u j,Voj G 
T n (a,b). The relation 

e pn ($„) = l -P n (T n (a,b)) 

holds. Then, the type counting lemma guarantees that 

\T n (a, 6)| < (n + 1)^“+^, 

which implies 

lim inf —=. log ^ ^ < b. (99) 

yn e na 

On the other hand, the set {— logP n (w) < na + b^/n} can 
be expressed as 

{-logP^H <na + bV^} = {P n (w) > e ~ na ~ b Vny 

U t p- 

P'eT n :P r (uj) >e- na ~ b ^™ for wGT", 

Hence, when a type P' G T n satisfies P n {w) > e ~ na ~ b Vn 
for u) G Tp,, the inequality P n {Tp,) < 1 yields 

|Tp/1 < P n (u)~ l < e na+b ^. 

Thus, 

{- logP”(o;) <na + by/n} C T„(a, b). 

Therefore, if the probability distribution P satisfies H(P) = a, 
then 

$(-^=) =limP n {-logP"H < na + by/n} 
yVp 

<lim P n (T n (a,b)) = 1 - limepn($„), 


i.e.. 


limepn(<!>„) < 1 — $( 



( 100 ) 


Since the r.h.s. of m is optimal under the condition i !99t . 
the inequality of m holds. Conversely, Since the r.h.s. of 
(1991 is optimal under the condition <M the inequality of 
holds. Thus, we obtain ■ 

In the universal variable-length source code, the order of the 
second term regarding expected coding length is log n. But, as 
discussed in the above proof, this term is negligible concerning 
the second order asymptotics of fixed-length source coding. 

Thus, in the variable-length and fixed-length source coding, 
the central limit theorem plays an important role, while its 
applications to the respective problems are different. 


L. Proof of Theorem \m 

Using the type method, we define a map d n from 9" to 
= f {1,..., i e 7 ia + f> V"} as follows. The map </>„ maps 
any element of T n (a,b) to 1. On the other hand, the map f 
restricted to a subset Tp, C T n (a,b) c is defined as the map 
from Tp, to A4 n satisfying the conditions Lemma 0 in the 
case of M’ n = \Tp,\. 

Then, the equality of (11001 guarantees 


epn('T„) 


< P n (Tp,) 

T£,CT n (a,bp 

1 


0 na-\-by/n 

n\T£,\ 


Y P n (Tp 


T^,CT n (a,b) 


<P n (T n (a, b ) c )—f P n {T n (a, b)) 
n 

J 0 H(P) > a 
H{P)=a. 

Therefore, we obtain (ED- 


X. Concluding remarks and Future study 

We proved that Folklore for source coding does not hold for 
the variational distance criterion o nor the KL-divergence 
criterion nor Of course, since our criteria o , m 
and <E3 are more restrictive than Han’s criterion < 1301 , there is 
no contradiction. But, it is necessary to discuss which criterion 
is more suitable for treating Folklore for source coding. This 
is left to future research. 

While we focused on the relation between source coding 
and intrinsic randomness only in the fixed-length case, the 
compression scheme used in practice is variable-length. In 
the variable-length setting, if we use the code whose coding 
length is decided only from the empirical distribution (this 
code is called Lynch-Davisson code) in the i.i.d. case, the 
conditional distribution of the obtained data is the uniform 
distribution. That is, in the variable-length setting, there exists 
a code attaining the entropy rate with no error in both settings. 
Thus, a result different from the fixed-length setting can be 
expected in the the variable-length setting. 

Furthermore, this type second order asymptotics can be 
extended to other topics in information theory. Indeed, in the 
case of channel coding, resolvability, and simple hypothesis 
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testing, lemmas corresponding to Lemmas EM have been 
obtained by Han [4]. Thus, it is not difficult to derive theorems 
corresponding to Theorem^ However, in channel coding it is 
difficult to calculate the quantities corresponding to a\p) 
and H(e, a\p) even in the i.i.d. case. On the other hand, similar 
to fixed-length source coding and intrinsic randomness, we 
can treat the second order asymptotics concerning the other 
two problems in the i.i.d. case. Especially, when we discuss 
simple hypothesis testing with hypothesis p and q from the 
second order asymptotics viewpoint, we optimize the second 
order coefficient b of the first error e~ nD< ' p ^ q ^^' b under the 
constraint that the second error probability is less than the fixed 
constant e. There is no difficulty in this problem. However, 
there is considerable difficulty in the quantum setting of this 
problem. 

In addition, third order asymptotics is expected, but it seems 
difficult. In this extension of the i.i.d. case, our issue is 
the difference of \/n{— — logP" — H(P)) from the normal 
distribution. If the next order is a constant term of logP n , we 
cannot use methods similar to those described in this paper. 
This is an interesting future problem. 

Acknowledgments 

The author would like to thank Professor Hiroshi Imai of 
the QCI project for support. He is grateful to Mr. Tsuyoshi 
Ito and Dr. Mitsuru Hamada for useful discussions. He also 
appreciates reviewers’ helpful comments. 

Appendix 

Proof of \47\ => 14(S*l l.- The relations 

D(p n o ^WpumJ = log M n - H(p n o fa 1 ) 
=H{pu,M n ) - H (.Pn o K l ) 

hold. 

If d(p n o 1 . Pu,m ) < 1/4, Fannes’ inequality [16] (See 
also Csiszar and Korner [14]) implies 

\H(pu,M n ) - H iPn O^ 1 )! 

< - d(p n O <j>~ l ,p Ut M n ) log (d{p n O 

Dividing the above by n, we have 

-D{p n o ( f>- 1 \\pu M n) 

n 

<d(p n o ^ 1 ,Pc/,x n )-(logM„ - log (d(p n O ff^iPuMn))- 
n 

Since lim i log M n < oo, we obtain => <@8). 
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